For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories. For example, it models the probability of counts for rolling a k sided die n times.
Although it's imprecise, in many fields, especially NLP, categorical distribution is often confused with multinomial distribution.
Each diagonal entry is the variance of a binomially distributed random variable, and is therefore
Multinomial distribution: https://en.wikipedia.org/wiki/Multinomial_distribution